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Abstract 

Background: Neural stem cells offer potential treatment for neurodegenerative disorders, such like Alzheimer's 
disease (AD). While much progress has been made in understanding neural stem cell function, a precise 
description of the molecular mechanisms regulating neural stem cells is not yet established. This lack of knowledge 
is a major barrier holding back the discovery of therapeutic uses of neural stem cells. In this paper, the regulatory 
mechanism of mouse neural stem cell (NSC) differentiation by tmem59 is explored on the genome-level. 

Results: We identified regulators of tmem59 during the differentiation of mouse NSCs from a compendium of 
expression profiles. Based on the microarray experiment, we developed the parallelized SWNI algorithm to 
reconstruct gene regulatory networks of mouse neural stem cells. From the inferred tmem59 related gene network 
including 36 genes, pou6fl was identified to regulate tmem59 significantly and might play an important role in the 
differentiation of NSCs in mouse brain. There are four pathways shown in the gene network, indicating that 
tmem59 locates in the downstream of the signalling pathway. The real-time RT-PCR results shown that the over- 
expression of pou6fl could significantly up-regulate tmem59 expression in CI 7.2 NSC line. 16 out of 36 predicted 
genes in our constructed network have been reported to be AD-related, including Ace, oqpl, orrdcS, cdl4, cd59o, 
cdsl, cidnl, cox8b, defbll, folrl, gdi2, mmp3, mgp, myrip, Ripk4, rndS, and sncg. The localization of tmem59 related 
genes and functional-related gene groups based on the Gene Ontology (GO) annotation was also identified. 

Conclusions: Our findings suggest that the expression of tmem59 is an important factor contributing to AD. The 
parallelized SWNI algorithm increased the efficiency of network reconstruction significantly. This study enables us to 
highlight novel genes that may be involved in NSC differentiation and provides a shortcut to identifying genes for 
AD. 




Background 

One of the main goals of systems biology is to determine 
the biological networks by high performance computing 
methods and integrating high-throughput data [1,2]. 
Compared to the traditional biology, which basic strategy 
is to decypher biological functions by concentrating 
efforts on a very limited set of molecules, this system- 
centric approach has an enormous success in producing 
complex biological networks composed of various types 
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of molecules (genes, proteins, MicroRNAs, etc) from 
large amounts of data [3]. 

The microarray technology facilitates large-scale surveys 
of gene expression data for whole-genome mapping and 
gene expression analyzing under various conditions [4]. A 
major focus on microarray data analysis is the reconstruc- 
tion of gene regulatory networks, which aims to find new 
gene functions and provide insights into the transcrip- 
tional regulation that underlies biological processes [5] . A 
wide variety of approaches have been proposed to infer 
gene regulatory networks from microarray data. Those 
approaches are based on different theories, including Boo- 
lean networks [6], Bayesian networks [7], relevance net- 
works [8], graphical models [9], genetic algorithm [10], 
neural networks [11], controlled language-generating 
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automata [12], linear differential equations [13], and non- 
linear differential equations [14]. There are two difficulties 
that can be addressed for constructing gene networks 
from gene expression data. Firstly, a single set of gene 
expression data contains a limited number of time-points 
under a specific condition. Thus, the problem of determin- 
ing gene regulatory network becomes an ill-posed one 
which is difficult to overcome. In the second, while micro- 
array experiments collect an increasing amount of data to 
be correlated, the network reconstruction is an NP-hard 
problem. Therefore, application of the statistical frame- 
work to a large set of genes requires a prohibitive amount 
of computing time on a single-CPU. A fundamental pro- 
blem with the sequential algorithms is their limitation to 
handle large data sets within a reasonable time and mem- 
ory resources. 

Neurodegenerative disorder, including Alzheimer's dis- 
ease (AD), Parkinson s disease, and Huntington's diseases 
etc, is a progressive loss of neurons. Recently, transplan- 
tation of NSCs within adult brain has been proposed as 
one of the potential therapies for neurodegenerative dis- 
orders [15]. NSCs are multipotent progenitor cells with 
long-term, self-renewal and differentiation capabilities to 
generate three major types of central nervous system 
(CNS) cell: neurons, astrocytes and oligodendrocytes 
[16]. They are identified as neuroepithelial cells extend- 
ing from the ventricle to basal lamina of the pial surface 
in the initial stage of brain development. During the his- 
togenesis, radial glial stem cells divide asymmetrically to 
neurons and give rise to astrocytes. Then NSCs become 
neural progenitorcells existing in the adult brain neuro- 
genic region: the sub-ventricular zone (SVZ) and the 
sub-granular zone (SGZ) [17-20]. 

So far the stem cell therapy for neurodegenerative dis- 
orders is still a challenging goal [21]. Mechanisms that 
control the proliferation, differentiation, migration and 
integration of NSCs are still poorly understood. Com- 
prehensive the gene regulatory network corresponding 
to NSCs by means of integrating and performing analy- 
sis with efficient algorithms is a crucial part of systems 
biology. 

Moreover, mouse transmembrane protein 59 (TMEM59) 
is an uncharacterized single transmembrane protein. Pre- 
viously, our study in vitro suggested that TMEM59 is dif- 
ferentially expressed during differentiation of primary 
NSCs from Sprague-Dawley rat striatum [22]. Especially, 
the down-regulation of TMEM59 with RNAi interference 
in mouse C17.2 neural stem cell line increases the differen- 
tiation of NSCs into neurons and astrocytes [23]. Our 
study indicated that TMEM59 is related to the differentia- 
tion and status sustaining of NSCs. So far the functions of 
TMEM59 have not yet been reported. Exploration on the 
tmem59 related gene regulation network of NSCs would 



help us better understand the molecular mechanism 
underlying the NSCs differentiation. 

In this paper, we constructed gene regulatory networks 
of mouse NSCs by the parallel strategy on stepwise net- 
work inference method. By integrating our microarray 
data and the public data, the regulatory mechanism of 
mouse NSCs differentiation by tmemS9 is explored 
throughout the genome. The important pathways and 
the core gene, pou6fl, are investigated by Real-time RT- 
PCR, suggesting that the over-expression of pou6fl sig- 
nificantly up-regulated tmem59 expression. We also 
show that many genes in the tmemS9 related gene net- 
work have been implicated in AD mechanism. The find- 
ings enable us to highlight novel genes that may be 
involved in NSC differentiation and provides a shortcut 
to identifying genes for AD. 

Methods 

Original data 

Microarrays simultaneously quantify thousands of genes 
on a single glass slide and their use has greatly expanded 
the breadth of quantified gene expression [24]. In our 
previous work, six wild and tmemS9 knockout mice were 
separately immersed in 75% alcohol for disinfection 
[25,26]. Under aseptic conditions, the hippocampuses 
were made into single cell suspension by mechanical 
whipping. The supernatant was discarded after 900 rmp, 
5 min centrifugation. Then the hippocampuses were 
resuspended in medium (DMEM/F12 culture medium 
with B27, EGF and bFGF) and were cultured in a glass 
bottle in C02 incubator (5% C02, 37 degree). The gene 
expression data were measured 4 days later. To under- 
stand the biological functions of tmemS9y we investigated 
the genes that were differentially expressed due to 
tmemS9 knock out. From the tmemS9 knock out micro- 
array datasets, 627 genes that differentially expressed 
with more than 2-fold change were selected as our source 
of data (data not shown). 

Significantly expressed genes selection 

In order to focus on much significantly expressed genes 
related to tmemS9, we selected 80 genes for further ana- 
lysis based on the Differential Ratio following tmemS9 
knock out. The precise description of the 80 genes with 
functions is illustrated in Additional File 1: Table SI. 

Public data selection 

In order to examine the regulatory mechanism between 
tmemS9 and the corresponding genes, it is necessary to 
integrate much more microarray data which can be 
from either in-house or public domain. A good resource 
for public microarray data is the National Institutes of 
Health Gene Expression Omnibus http://www.ncbi.nlm. 
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nih.gov/geo/. In this study all the data we used is 
MIAME compliant and is selected from Gene Expres- 
sion Omnibus (GEO). 

Microarray data normalization 

We transferred the probe data to standard gene expres- 
sion data. Because a single gene is represented on the 
array by typically a set of 11-20 pairs of probes, we 
mapped probes to their corresponding Entrez GenelDs. 
Affymetrix probes were mapped to Entrez GenelDs 
using the 3 Sep 2010 release of NetAffx annotations. 
Where probes had multiple GenelD mappings, the one 
which appears at the top of the GenelD list was selected 
because been observed that in the majority of such 
cases the first identifier tends to be the only one with a 
published symbol as opposed to one that was automati- 
cally generated. We calculated the Average Difference 
for all the probes of the corresponding gene to compare 
the probe sets expression level of them. The higher the 
probe set expressed, the larger Average Difference the 
probes got. Then the expression levels in those probe 
sets mapped to same gene was summarized. Probe 
intensities from Affymetrix oligonucleotide microarrays 
were normalized to gene expression levels using robust 
multichip analysis (RMA) [27] which is reported to be 
the single best normalization method compared to 
MASS (Affymetrix), GCRMA, and Dchip PM [28]. The 
use of ratios or raw intensities is governed by the cap- 
abilities of the microarray technology, not by our 
algorithm. 

Parallelized SWNI Network inference algorithms 

We designed and evaluated the Stepwise Network Infer- 
ence (SWNI) algorithm in previous studies [29]. The 
SWNI algorithm is a rapid and scalable method of 
reconstructing gene regulatory networks using gene 
expression measurements without any prior information 
about gene functions or network structure. It solves 
small size problem for high-dimensional data with strict 
selections in the stepwise regression model. More pre- 
cisely, the SWNI algorithm infers a module network in 
two major stages. Firstly, the model is built with ordin- 
ary differential equations to describe the dynamics of a 
gene expression network in perturbation. Secondly, a 
regression subset-selection strategy is adopted to choose 
significant regulators for each gene. Moreover, statistical 
hypothesis testing is used to evaluate the regression 
model. Then the gene expression network with signifi- 
cant edges and genes is predicted. 

However, the SWNI algorithm is a sequential method 
essentially. While dealing with a large set of genes, the 
SWNI algorithm requires a prohibitive amount of com- 
puting time. To overcome this extreme computational 
requirement, in this study, we developed a parallel 



implementation of the SWNI algorithm. Using the mes- 
sage passing interface (MPI), the parallelized SWNI 
algorithm has higher computing efficiency compared 
with the SWNI method. 

In this study, as same as our own microarray data, the 
multiple datasets were selected from the experimental 
platform GPL1261 and were normalized with the RMA 
algorithm. We subsequently combined all the datasets 
into a composite training set. The batch adjustment 
algorithm was applied in the combined training set to 
ensure that all the datasets were well intermixed [30]. 
The detail of the parallelized SWNI algorithm is as 
follows. 

A gene expression network is expressed by a set of 
linear differential equations with each gene expression 
level as variables, and we have 

X = AX + P, 

where A = (<3^//)«x« is din n x n gene regulatory coeffi- 
cient matrix, and refers to the connectivity of genes in 
the predictive network; X is an n x m matrix referring 
to the gene expression level at time t; P = {pij)nxm is a 
matrix representing the external stimuli (like perturba- 
tions) or environment conditions. The computational 
complexity of the sequential SWNI algorithm is O(n^). 
In order to reduce the computational complexity, we 
decomposed P by row to partition parallel tasks. 

Assessment of the parallelized SWNI algorithm 

Artificial gene networks with random scale-free struc- 
ture were generated and the distribution of vertices fol- 
lows a power law. The parallelized SWNI algorithm and 
the SWNI algorithm have same computing precision. 
The computing precision of the SWNI algorithm has 
been discussed in [29]. And the performance of the 
SWNI algorithm was assessed by comparing the inferred 
network with the pre-determined artificial network. 

The performance of the parallel strategy is evaluated 
on the artificial gene networks in two important aspects, 
which are speedup and efficiency. Compared with the 
SWNI algorithm, the parallelized SWNI algorithm per- 
formed better in efficiency. And as the number of pro- 
cessors increases, we got almost linear speedups of the 
parallelized SWNI algorithm. 

RNA Isolation and Real-time RT-PCR analysis 

To study the regulation of pou6fl to tmemS9 and quan- 
tify mRNA by real-time RT-PCR in C17.2 NSCs, we used 
ReverTra® Ace qPCR RT kit and SYBR® Green Realtime 
PCR Master Mix (Toyobo Life Science Department). 

For Neural stem cell line, C17.2 cells were plated onto 
24-well plates at a density of 5 x 10^ cells per well and 
cultured at 37°C with 5% C02 for 24 hours before 
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transfection. After reaching about 90% confluence, cells 
were split. The murine cerebellum-derived immortalized 
neural stem cell line C17.2 was originally described by 
Snyder et al. [31]. 

Full-length cDNA fragment of Pou6fl was then ampli- 
fied by RT-PCR using total RNA from mouse brain. 
The forward primer was 5'-GAAGATCTATGCCCGG- 
GATC AGCAGTC-3' and the reverse primer was 5'- 
TCCGGAATTCCGGGATCTGAA AGACGTTC-3'. The 
cDNA was further digested with Bgl ll/EcoR I and sub- 
cloned into pEGFP-N2 vector, ultimately sequenced by 
Invitrogen. The total of 1 ug pEGFP-N2-Pou6fl DNA 
per well was used to transfect C17.2 cells using Lipofec- 
tamine 2000 at a proportion of 1:1 (according to the 
manufacturer's protocol). C17.2 cells transfected with 
pEGFP-N2 in the same condition were used as the con- 
trol group. 

Finally, the total RNA was isolated from each group 
according to the Trizol manufacture's standard protocol 
(Takara Bio Inc). PGR primers for amplification of the 
mouse tmemS9 gene was specifically design (Invitrogen). 
Chloroform and isopropanol were used to extract and 
precipitate the total mRNA. RT-PCR analysis was per- 
formed on a PE9700 PGR machine. All reactions were 
repeated for three times. The relative quantity of tmemS9 
mRNA in the cells was calculated using the equation 
RQ = 2'^^^^ The p-actin was used for normalization as 
the internal control gene whereas the calibrator was the 
mean threshold cycle (Ct) value for each control group 
transfected with pEGFP-N2 vector. The forward primer 
sequence for tmemS9 gene is 5'-ATGCTTGTCAT 
CTTGGCTG-3' and the reverse primer sequence is 5'- 
TCACTTCAGAACG ACCTGA-3'. The forward primer 
sequence for P-actin is 5'-TGTCGCTGTATGCCT and 
the reverse primer sequence is 5'-TCACGCAC- 
GATTTCCCTC-3'. 

Statistical analysis 

Statistical analysis and graph creation were performed 
by SigmaStat3.5, SigmaPlot 10.0 and Pajek. Data were 
obtained from at least three independent experiments. 
Results were presented as means ± SEM. One-way 
ANOVA was used to analyze the results of real-time 
PGR. Proportion was analyzed by z-test, and Yates cor- 
rection was applied to calculations. 

Results 

NSCs related microarrays are selected 

We selected microarrays about NSGs, neurogenesis, glias 
and central nervous system (GNS), due to that NSGs are 
the principal source of constitutive neurogenesis and 
glias in the GNS. 146 microarray datasets were selected 
from 21 different platforms. The species, accession num- 
bers, precise descriptions and number of data sets of the 



21 platforms are illustrated in Additional File: Table S2. 
The comparability of gene expression data generated 
with different microarray platforms is still a matter of 
concern. Mixing of data from various platforms could 
lead to poor results due to quantitative biases among the 
technologies [32]. Therefore, we selected the datasets 
including only profiles from a single experimental plat- 
form, which ID is identified as GPL1261 in GEO data- 
base. In particular, we selected 62 mouse stem cell 
related sample data sets for further analysis from the 
Affymetrix Mouse Genome 430 2.0 arrays (Array 
([Mouse430_2])), which includes approximately 45, 000 
probe sets. The 62 mouse NSG related microarray data 
sets included in the analysis are illustrated in Table 1. 

The performance of the parallelized SWNI algorithm 

Following the scale-free topology, we simulated two types 
of artificial gene networks in size of 1000 nodes, 3054 
edges, and 1500 nodes, 4630500 edges, respectively. The 
performance of the parallelized SWNI algorithm was 
assessed among the workstation described in the method. 
Speedup and efficiency of the parallel SWNI algorithm 
are illustrated in Figure 1, and the running time is shown 
in Table 2. Figure 1 shows that as the increase of the net- 
work scale, the parallelized SWNI algorithm performed 
better in both efficiency and speedup. Table 2 shows 
that, as increase in the processor numbers, the comput- 
ing time of the algorithm falls dramatically. The results 
demonstrated that the parallelized SWNI algorithm has 
good performance on the artificial gene networks. 

Gene regulatory networks of mouse neural stem cell 

GRNs related to tmemS9 were constructed on a compen- 
dium of expression profiles by the parallelized SWNI 
algorithm (Figure 2). As illustrated in Figure 2A, NSG- 
GNl contains 56 genes, 230 edges, and the average 
degree is 4. From NSG-GNl, tmemS9 is shown to be 
negatively regulated by cdS9, while positively regulated 
by sncg. The global importance of a node in a network 
can be evaluated by the node degree of it [33]. The basic 
evaluated strategy is that the bigger the degree of a node 
is, or the closer to the centre of a network the node is, 
the more important it is. According to this principle, in 
NSG-GNl there are 22 important nodes, which have 
higher in-degree than the average degree, and can be 
identified as: aqpl, calml4, cdS9a, clic6, cxcll, cyb561, 
flvcr2, igfbpll, IgalsSbp, pou6fl, psmbS, s3-12, sncg 
arrdcS, axudl, cdsl.folrl, gpnmb, paqr9, ptprv, ripk4 and 
slc35f3. Among the 22 nodes, there are 9 more important 
nodes with twice in-degree than the average degree. 
Those nodes are arrdcS, axudl, cdsl, folrl, gpnmb, 
paqr9, ptprv, ripk4 and slc3Sf3, 

In order to focus on more significant genes, we rose 
the significance level of the hypothesis testing in the 
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Table 1 62 mouse neural stem cell related microarray data sets Included in the analysis. 

Platform: GPL1261 



Accession Type 



Samples Description 



No. 



GSE12499 NSCs 



GSEIC 



GSE11862 
GSE10796 

GSE11859 

GSE8034 

GSE8091 

GSE10577 

GSE9812 



GSE5817 
GDS2209 
GDS1017 

GSE981 1 

GSE6675 

GSE5425 
GSE501 1 

GSE1999 

GDS2937 

GDS2846 
GDS2803 
GDS2391 
GDS2096 

GDS1793 
GDS1693 

GDS1635 
GDS1084 

GSE13386 

GSE13385 
GSE13384 



Adult NSCs 



GSE13379 CNSs 



Neurogenesis 
NPCs 

NPCs 

Radial glias 

eNSCs 

Glia 

NPCs 



GSE9763 NPCs 
GSE8555 eNSCs 



eNSCs 

CNS 

CNS 

NPCs 

Astroglia 

CNS 
MSCs 

Neurogenesis 

NPCs 

Neuron 
Neurogenesis 
Neurogenesis 
Cancer cells 

Neuron 
CNS 

Neuron 
eNSCs 

Neuron 

Neuron 
Neuron 



10 
11 

107 

6 
4 

27 

17 

16 
12 
22 

20 
8 

21 
6 
15 

42 



10 

15 



39 

20 
8 

24 

9 
6 



Oct4-lnduced Pluripotency in Adult NSCs 

Pluripotent SCs induced from adult NSCs by 

reprogramming with two factors 

Application of a translational profiling approach for the 

comparative analysis of CNS cell types 

Early Gene expression changes after axonal injury 

Identification of genes that restrict astrocyte differentiation 

of midgestational neural precursor cells 

Acquisition of granule neuron precursor identity and 

Hedgehog-induced medulloblastoma in mice 

Prospective isolation of functionally distinct radial glial 

subtypes - lineage and transcriptome analysis 

Embryonic brain development 

endothelin signaling from photoreceptors to glia 

Molecular heterogeneity of developing retinal ganglion and 

amacrine cells 

Transformed glial progenitor cells 

D-3-phosphoglycerate dehydrogenase deficiency effect on 

the embryonic head 

understanding the process of cortical development 
Spinal cord and dorsal root ganglion 
Hypoxic-ischemic injury response to erythropoietin 
pretreatment 

Individual retinal progenitor cells display extensive 

heterogeneity of gene expression 

Astroglial gene expression program elicited by 

fibroblast growth factor-2 mande-affy-mouse-307080 

Spinal cord and dorsal root ganglion 

molecular changes the MSG acquire through in-vitro 

passages determine their therapeutic potential to EAE 

Hypoxic-ischemic injury response to erythropoietin 

pretreatment 

Olfactory marker protein deficiency effect on the 
olfactory epithelium 

MicroRNA miR-124 expression effect on neuronal cell line 

Fluoxetine effect on the hippocampus 

PGC-1 alpha transcriptional coactivator null mutation 

Glucocorticoid receptor activation effect on breast cancer 

cells 

Homeodomain interacting protein kinase 2 dominant-negative form effect on trigeminal ganglion 

Transcription factor Nrl deficiency effect on photoreceptor 

development 

Nodose and dorsal root ganglia comparison 
Homeobox Dlxl/2 mutations effect on embryonic 
telencephalon 

Comparative analysis of Drdl+ Medium Spiny Neurons, 
Drd2+ Medium Spiny Neurons, cocaine treatment 
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Table 1 62 mouse neural stem cell related microarray data sets Included in the analysis. (Continued) 



1 J JO/ 


INcUl Ul 1 


Z4 






LINO 


Od 


A translational profiling approach for the molecular 








characterization of CNS cell types 








characterization of CNS cell types 


1 DD/y 




1 07 
1 U/ 


AppiiLaLion oi a Liaiibiauonai pioiiiiny appioacn loi int: 








comparative analysis of CNS cell 


Ujt 1 1 ZJO 


N6uron 


z4 


Npas4-regulated genes in mouse hippocampal neurons 


bob 1 u/yo 


NrLS 


4 


Identification of genes that restrict astrocyte differentiation 








of midgestational neural precursor cells 




MPrc 

INrLS 


07 
z/ 


Acquisition of granule neuron precursor identity and 








ncUycllOy IMUUCcU 1 1 IcUUIIUUIdbLOl lid III lIllLc 




Rsclisl qIis 


1 7 
1 / 


Prospective isolation of functionally distinct radial glial 








subtypes 


UjC I 1 zU/ 


N6uron 


0 


Dorsal root ganglion 


1 1 1 4 1 


In jL-b 


Z 1 


mitrLLS OI INyn OVtrl trXpi trbblOl 1 Oil Lilt: UcVtriUpiliy dllU 








lIldLUIc lOlcUldlll 


uot 1 UoOU 


neuron 


z 


Role of Endothelin in SCG axon pathfinding 




eiNoLS 




PDGP-B induces a homogeneous class of 








unyuuci lui uy iiui 1 idb iiuiii ciiiuiyuiiiL iicuidi piuyciiiLuib 


uobyoUo 


Neurogenesis 


y 


Striatal gene expression data from 12 weeks-old R6/2 mice 








and control mice 


UotyoU4 


Neurogenesis 


o 

y 




UotyjoU 


Neurogenesis 


p 

o 


• 

Transcription factor Ctip2 deficiency effect on brain striata 


UoLdj4U 


Neurogenesis 


1 z 


Expression data from olfactory epithelium of Lip-C-treated 








mice compared to Lip-O-treated control mice 


*o jiiy44j 




1 J 1 


jlccp (JcpilVdLIOII dllU Lllc Uldlll 


r;cp^/ipc 
>ajliD4oj 


INeUl Oycl Icbib 


D 


LXpiebblOII UdLd llOlil OlldCLOiy epiUlcllUlil OI ndlltryuill 








mutant mice compared to littermate controls 


r;c:pQ7^n 




1 9 
1 z 


milUiyUIIIL bLclll Lcllb VVILII cXpdllUcU IcptrdLb 


'aotD4/D 


Neuron 


/I 
4 


Pluoxetine effect on the hippocampus 


r;cppo-| 1 
*ajlioj 1 1 


iNeui oyei lebib 


J 


UIX 1 lOI I leOUOl I Idll 1 LIdl IbCl ipUOl 1 IdCLOl lllULdllLb 


*O_)II0UZ4 


K\<zr per 


p 
o 


IVIUIIIIt: Lj Ccllb, lltrUldl picCUIbOl Ccllb dllU t:l 1 lUl yOl IIC 








\ lUl OUIdSLS 


r'QP/lQ'77 


Neurogenesis 


D 


Olfactory marker protein deficiency effect on the olfactory 








epithelium 


r;cp^77c 
UotDz/ J 


J) 


jD 


neuronal dysfunction associated with the ataxic and epileptic 








pi ici loiypcb 


GSE4774 


Development 


1 5 


To determine how DIx homeobox genes function 


GSE4752 


CNS 


6 


target genes regulated by Egr transcriptional regulators 


GSE4051 


Diff. 


8 


Photoreceptor-specific nuclear receptor NR2E3 ectopic 








expression effect on NRL null retinas 


GSE4041 


CNS 


6 


abnormal expression patterns in NR3B-null mice 


GSE2873 


Diff. 


4 


Skeletal muscle synaptic region 


GSE2869 


Neurogenesis 


8 


Homeodomain interacting protein kinase 2 dominant- 








negative form effect on trigeminal ganglion 


GSE2161 


eNSCs 


8 


Homeobox Dlxl/2 mutations effect on embryonic 








telencephalon 



NPCs: Neural progenitor cells; eNSCs: Embryonic neural stem cells; MSCs: Mesenchimal stem cells; ESC: Embryonic stem cells; Diff.: Differentiation. 



parallelized SWNI algorithm to delete nodes with lower contains nodes and edges with higher positive rate and 
significant. NSC-GNl was further extracted to be a negative rate compared to nodes and edges in NSC- 
sparser one, which is called NSC-GN2 (Figure 2B). It GNl. 36 genes have significant relationship with 
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0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 

Number of processors Number of processors 

Figure 1 The efficiency and speedup curves of the parallel SWNI algorithm. Both of the efficiency and speedup are calculated on two 

samples. One is an network of 1000 nodes and the other is an network of 1500. (A) Efficiency of the algorithm draw dramatically when 

processors increased from one to four and then tend to stable. (B) The Speedup is close to a straight line with network of 1500 nodes 

compared to the network of 1000 nodes. 
J 



tmem59 and 46 significant regulatory relationships were 
identified in NSC-GN2, of which the average node 
degree is 1.2. Pou6fl regulates 11 genes in NSC-GN2, 
suggesting that it is the most important gene in it. Rnd3 
and cdsl is related to 5 different genes, respectively. It is 
worth to mention that, three genes are found to regulate 
tmemS9, In the other words, tmemS9 is negatively 

Table 2 Computing time of the parallel SWNI algorithm 
for two types of networks on increased processors. 



Number of Network of 1000 Network of 1 500 

processors nodes nodes 



1 


6501.85 


48102.6 


2 


3670.23 


245284 


4 


2193.08 


12528.7 


6 


1485.62 


8457.89 


8 


1122.75 


6405.23 


10 


908.12 


5139.61 


12 


769.32 


4280.88 


14 


666.19 


3685.25 


16 


588.23 


3254.62 


18 


525.86 


2887.79 


20 


479.17 


2621.81 


22 


439.52 


2399.3 


24 


406.73 


2217.95 


26 


383.95 


2054 


28 


357.58 


19214 


30 


338.8 


1 780.58 


32 


322.09 


1689.94 



We simulated two types of artificial gene networks in size of 1000 nodes, 
3054 edges, and 1500 nodes, 4630500 edges, respectively, to assess the 
performance of the parallelized SWNI algorithm. The computing time is 
calculated. The results show that as increase in the processors number, the 
computing time of the algorithm falls dramatically. The study suggested that 
the parallelized SWNI algorithm has good performance on the artificial gene 
networks. 



regulated by cdS9a, while positively regulated by sncg 
and myrip. Both cdS9a and sncg were also found in 
NSC-GNl. 

Combined with published data, we constructed an inte- 
grated network containing both gene regulations and pro- 
tein-protein interactions with 68 nodes and 98 edges 
(NSC-GN3 is illustrated in Figure 2C). The average node 
degree of NSC-GN3 is 1.4. 39 genes, 29 encoded proteins, 
66 regulatory relationships and 32 protein-protein interac- 
tions are included in NSC-GN3. Partially, gene regulatory 
relationships of mouse NSCs and differential mechanism 
of NSCs in protein level is shown in NSC-GN3. 

Novel regulatory pathways 

We used the predicted regulatory network of mouse 
NSCs to infer newly gene interactions. We transformed 
the location of the nodes in NSC-GN2 and got NSC- 
GN4 (Figure 2D). From NSC-GN4, four pathways which 
is related to the expression of tmemeS9 were obviously 
identified as 

Pou6fl-CdS9a- TmemS9y 

Pou6fl -sncg- TmemS9y 

Pou6fl-Wfdc2-Rnd3-Mgp-Myrip-TmemS9, and 

Pou6fl - Wfdc2-Rnd3-Sncg- Tmem59. 

All the four pathways initiated from the transcription 
factor pou6fl. Moreover, the expression of tmem59 is 
regulated directly by myrip, sncg and cd59a, all of which 
are regulated by pou6fl directly or indirectly. 

A novel regulator, pou6f1, regulate the expression of 
tmem59 

From Figure 2D, pou6fl is identified to be a dense node, 
giving hint that pou6fl may play an important role in 
tmemS9 expression. In order to confirm this supposition, 
we constructed an expressional vector to over-express 
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Figure 2 Predicted gene regulatory networl^s related to tmem59. (A) NSC-GNl is predicted to be a network of 56 genes and 230 edges. (B) 
NSC-GN2 is predicted to be a networl< of 37 genes and 46 regulations. (C) NSC-GN3 is identified as a combined networl< of 39 genes and 29 
proteins witli 66 regulations and 32 protein-protein interactions. Dark nodes are genes, while light nodes are proteins. (D) NSC-GN4 is extracted 
from (B) to focus on the precise pathways directed to tmem59 from pou6fl. 



transcription factor POU6F1 fused with EGFP (pEGFP- 
N2-POU6F1) for real-time observation and quantification 
in C17.2 NSCs. The results suggested that, POU6F1, a 
transcription factor, was expressed successfully in the 
nucleus of NSC compared with ubiquitous location of 
EGFP (Figure 3A, B, C, D). C17.2 NSCs transfected with 
pEGFP-N2 vector were used as a control group. Statisti- 
cally, C17.2 NSCs showed 37.06% ± 4.31% (P < 0.01) 
increase in tmemS9 expression caused by the overexpres- 
sion of pou6fl (Figure 3E). This study firstly identifies a 
regulator pou6fl that may account for tmemS9 expression. 

Localization of tmem59 related genes and identification 
of functional-related gene groups 

In NSC-GN2 (Figure 2B), 36 genes were predicted to be 
related to tmemS9 and 27 of them are annotated in Gene 
Ontology (GO). Among the 27 annotated proteins, 4, 1, 2 
and 4 proteins are localized on plasma, membrane, 
nucleus and extracellular, respectively. Figure 4 illustrates 



that 10.8%, 6.0%, 5.4% and 10.8% of all the 37 proteins in 
NSC-GN2 are localized on different sites, except 27% un- 
annotated ones. 

As mentioned above, the novel membrane proteinT- 
MEM59 modulates complex glycosylation. Based on GO 
annotation, there are 42% of the 37 proteins involved in 
metabolism including TMEM59 (Table 3), suggesting that 
most of the genes have functional similarity with tmem59. 
Beyond that, more than 20% of the 37 proteins are 
reported to transport materials within cells. The analysis of 
tmemS9 related GRN of mouse NSCs highlights new can- 
didate genes involved in (i) peptidase activity, hydrolase 
activity, kinase activity, and transferase activity; (ii) trans- 
portation of water, lipid and metal ion; (iii) protein binding; 
(iv) transcription process. 

Identification of Alzheimer's disease related genes 

It is interesting to address how many genes in tmem59' 
related GRN (NSC-GN2) could be related to Alzheimer's 
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Figure 3 Positive regulation of pou6f1 to tmem59 is evident in CI 7.2 NSCs. (A, B) gray photos were captured in white light. (C, D)P0U6F1- 
EGFP (green) was over-expressed in the nucleus of CI 7.2 neural stem cells. Photography was captured at 36-hours after transfection of pEGFP- 
N2-Pou6fl plasmid. (E) Real-time PCR showed that tmem59 is up-regulated by over-expression of pou6fl (normalized to (3-actin). N = 3, **P < 
0.01. OE: over-expression of P0U6F1-EGFP; Con: control group transfected with pEGFP-N2. 



disease (AD). Epigenetic profiling reveals that TMEM59 
was down-regulated and lower methylated in major phy- 
chosis [34] . And the maturation and localization of amy- 
loid precursor protein (APP) is reported to be modulated 
by TMEM59 [35]. APP is crucial during the AD patho- 
genesis, which is often accompanied by some psychotic 
diseases. In NSC-GN2, CdS9a, myrip and sncg are the 
three genes which directly regulate tmemS9, and have 
been proved to be AD-related in previous reports. In 
NSC-GN2, our study showed that 17 out of 37 predicted 
genes (including tmemS9) are related to AD in NSC- 
GN2: Ace [36], aqpl [37], arrdcS [38], cdl4 [39], cdS9a 
[40], cdsl [41], cldnl [42], coxSb [43], defbll [44]Jolrl 
[45], gdi2 [46], mmp3 [47], mgp [48], myrip [49], Ripk4 
[50], rndS [51,52], and sncg [53]. Among them, CdS9a, 
myrip and sncg regulate tmemS9 directly. 



Discussion 

TmemS9 has been reported to sustain the status of 
NSCs in vitro. Knockout of tmemS9 in mouse brain can 
induce expressional changes of 627 genes in neonatal 
mouse NSCs. Until now, the underlying function of 
tmemS9, especially on the differentiation of mouse 
NSCs, is still unclear. In this study, we try to find out 
regulators likely to affect the gene expression in mouse 
NSC and new mechanism of neurodegeneration in AD 
from a compendium of expression profiles. 

Firstly, 36 genes were identified to be tmemS9 related. 
In the predicted network NSC-GN2, tmemS9 is regu- 
lated directly by cdS9a, myrip and sncg. Meanwhile, four 
pathways were found in NSC-GN2 to regulate the 
expression of tmem59 from pou6fl, Tmem59 is located 
downstream in all the pathways, indicating that tmemS9 




Figure 4 Location of proteins in Tmem59-related regulatory network. (A) Distribution of proteins in sub-cellular level. Most of the proteins 

were located in membrane. (B) Non-Plasma proteins (located in membrane, nucleus and extra-cell) were significantly more than in plasma. 

Unknown: no notation information in Genebank; ***p < 0.001. 
I ) 
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Table 3 Function of the 37 differentially expressed genes 



identified in Figure NSC-GN2. 


Function Number of 


Go accession 


Entrez Gene 


genes 


liUiillLici 3 




Enzymatic 12 


(jU.UU I o/o/ 


Ace 


activity 








GO:00 16740 


Cdl4 




GO:00 16740 


Cdsl 




GO:00 16491 


Cox8b 




Uw.UUUJUy / 


^UIZ 




LiU.UUUozo/ 


Mmp3 




(j(j:UU I / I 3/ 


My rip 




UU.UU 1 DoU 1 


i\ieKj 




UU.UU 1 D/o/ 


Ptprv 




CjU.UU IdoU I 


Ripk4 




(jU.UUU4od/ 


Tmem59 




G(j:0004od/ 


Wfdc2 


Transporter 6 


vjU.UU I jzjU 


Aqpl 


activity 








GO:0046872 


Cyb561 




rin-DDDRS 1 7 


Folrl 




GUiUUUbbUy 


Mgp 




GU.UUUbo 1 y 


baai 




GU.UUUbizo 


Slc6al 3 


Protein binding 5 




LQjya 




r'n-nni aqqq 

oU.UU 1 DDOO 


Lion 1 






Gpnmb 




G(J:(J(J(J4o/z 


Paqr9 




UU.UU/U 1 yz 


Rec8 


Molecular 4 


GU.(J(J(J3D/4 


Arrdco 


function 








GO:0003700 


Pou6fl 




GO:0000166 


Rnd3 




GO:0003674 


Sncg 


Cytoskeleton 1 


GO:0007010 


KrtS 


Unknown 9 


None 


1110059ml9Rik 




None 


290001 7f05Rik 




None 


Axudl 




None 


Calml4 




None 


C230095g01Rik 




None 


Defbl 1 




None 


S3-12 




None 


Slc35f3 




None 


Tinagl 



Based on the GO annotation, the function of the 37 genes in NSC GN2 is 
classified. The function, number and Entrez Gene in Genbank are listed as 
follows. There are 42% proteins having the function of metabolism including 
TIV1EIV159 in GO-known proteins. Except that, more than 20% proteins function 
on transporting materials for cells. 



is probably regulated by all the other genes. These con- 
clusions are in accordance with observations from ear- 
lier studies [23]. Our study suggests that the 36 genes 
probably act on the differentiation of NSCs and have 
similar function with tmemS9, 



Secondly, Our RT PGR analysis results shown that 
tmemS9 is positively regulated by pou6fl. And pou6fl 
has been reported to play an important role during the 
development of mouse telencephalon [54]. Our study 
suggests that the influence of pou6fl on mouse telence- 
phalon development is originated from the effect on 
NSCs during the mouse embryonic development. This 
study provides further insights into the role of the dif- 
ferentiation of NSCs. 

Thirdly, our study suggests that TMEM59 has similar 
localization with most of its regulators. Recently, 
TMEM59 was reported to be a Golgi-localized protein, 
which is crucial in modulated complex glycosylation, cell 
surface expression and secretion of amyloid precursor 
protein [34]. As known, proteins in the cell plasma are 
synthesized directly in free ribosome, while some other 
membrane proteins which transfer to the nucleus, are 
synthesized in rough endoplasmic reticulum. The second 
type of protein will be transported to subcellular location 
secreted by Golgi-complex. Among the 27 annotated 
genes in the predicted network NSC-GN2, more than 
85% were identified to be nonplasmic localized. This sug- 
gests that 85% of the 27 proteins are Golgi-localized in 
maturation and has similar localization with TMEM59. 

Furthermore, our study suggests that the tmemS9 
related gene regulatory network (NSC-GN2) is probably 
AD-related. As the precursor of P -amyloid protein (Ap), 
P -amyloid precursor protein (APP) is addressed to be 
the first genetic mutation. The deposition of Ap in pla- 
ques of brain is already identified to be the cause of AD. 
As been reported, TMEM59 is Golgi-localized in 
Hek293 cell line, and modulate the complex glycosyla- 
tion, cell surface expression and secretion of APP. The 
study indicates that TMEM59 may be associated with 
AD. In our predicted mouse NSCs related network 
NSC-GN2, three genes which regulate TmemS9 directly 
are identified as sncg, cdS9a and myrip, Sncg (y-synu- 
clein) has been identified to be correlated to dementia 
hippocampus of AD and pathology of Parkinson's dis- 
ease (PD) [55]. Deficiency of complement regulator 
cdS9a is the cause of neurodegeneration in AD [56]. 
And Rab27 binding protein MYRIP is involved in insu- 
lin exocytosis, impaired which is the pathogenesis of AD 
[57,58]. Besides, there are nearly 50% of all the genes in 
NSC-GN2 have been reported to be directly or indir- 
ectly related to AD. Therefore, tmemS9, which directly 
regulated by cd59a, myrip and sncgis, is suggested to be 
associated with AD, and the unreported genes in NSC- 
GN2 are probably related to AD either. 

Conclusions 

In this study, we predicted the mouse NSCs related 
GRNs by the parallelized SWNI algorithm integrating 
data from the tmemS9 knock out microarray datasets 
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and 62 mouse stem cell related microarray datasets in 
GEO. The parallelized SWNI algorithm increased the 
efficiency of network reconstruction significantly. In par- 
ticular, a high confident network of mouse NSCs (NSC- 
GN2) was predicted. In the network, 36 key genes regu- 
lating tmemS9 expression were identified. The RT PGR 
result suggested that tmemS9 can be positively regulated 
by pou6fl significantly. Moreover, 17 out of 36 genes 
are predicted to be AD related in our network including 
tmemS9, This is in coherence with published references. 

This present work provides new insights regarding the 
gene regulations of NSGs. The parallel methods pre- 
sented in this paper might also become a scalable tool 
for large-scale analysis on various types of cells and spe- 
cies. And integration of multiple datasets will provide 
for new research directions in microarray analysis. This 
study enables us to highlight novel genes that may be 
involved in NSG differentiation and provides a shortcut 
to identify genes for AD. 

Additional material 



Additional file 1: Table SI for 80 selected genes lists from the 
tmem59 knock-out microarray experiment included in the analysis. 

From the tmem59 knock out microarray datasets, 627 ger^es that 
differentially expressed with more than 2-fold change were selected as 
our source of data. In order to focus on much significantly expressed 
genes related to tmem59, we selected 80 genes for further analysis 
based on the Differential Ratio following tmem59 knock out. The symbol, 
Gene ID and function of each gene can be searched in Genbank. 

Additional file 2: Table S2 for 21 platforms related to 146 
microarray datasets about mouse NSCs. Microarrays about NSCs, 
neurogenesis, glias and central nervous system (CNS) are selected, due to 
that NSCs are the principal source of constitutive neurogenesis and glias 
in the CNS. 146 microarray datasets were selected from 21 different 
platforms for constructing genes regulatory network of mouse NSC. The 
species, accession numbers, precise descriptions and number of data sets 
of the 21 platforms are illustrated. 
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